Managing textual descriptions based on quality rating

ABSTRACT

A system and method for managing reconciled textual descriptions based on merchant quality rating are provided. Merchants providing items to an electronic marketplace submit merchant values corresponding to textual descriptions of attributes of the item. A merchant value analyzer component processes multiple merchant descriptions to select a set of textual descriptions based on a comparison of the textual descriptions and merchant quality ratings. Additionally, a merchant quality analyzer component calculates updated merchant quality ratings based on the submitted merchant values.

This application is a divisional of U.S. application Ser. No. 11/774,346 filed Jul. 6, 2007.

BACKGROUND

One aspect associated with the widespread usage of communication networks generally, in the Internet particularly, has been the emergence of electronic marketplaces. Electronic marketplaces typically are a network site that offers an interface, such as a Web page, to consumers via a communication network. In one embodiment, the electronic marketplace can facilitate the sale of a wide variety of items that are offered by multiple merchants via the electronic marketplace. In this embodiment, the electronic marketplace can maintain a centralized catalog of items offered for sale by each merchant along with information about the specific items offered by the merchant. The centralized catalog approach provides consumers with a mechanism for comparing merchandise, comparing prices, and making their purchases from a wide variety of merchants all within the same electronic marketplace.

Given the potential large volumes of items being offered for sale by a wide variety of merchants, the information concerning the items provided by the merchant becomes very important. In a specific example, two or more merchants may be offering the same item, or relatively the same, for sale, which facilitates price comparison and purchasing diversity. At the same time, however, it can be difficult for the electronic marketplace to select the information that best describes each particular item, especially if there are differences between the information provided by each merchant. For example, each merchant may provide slightly different information that describes aspects of an item for sale, such as item title/name, descriptions, etc. Accordingly, the electronic marketplace may have to make decisions as to the information that is provided to a consumer in the electronic marketplace, generally referred to as information reconciliation. Current approaches to information reconciliation relate to a manual analysis of each submission and/or an arbitrary selection of information from a designated merchant.

DESCRIPTION OF THE DRAWINGS

The foregoing aspects and many of the attendant advantages of this invention will become more readily appreciated as the same become better understood by reference to the following detailed description, when taken in conjunction with the accompanying drawings, wherein:

FIG. 1 is a pictorial diagram of an illustrative networked environment for facilitating item information reconciliation from a variety of third party merchants;

FIG. 2 is a block diagram of a framework for item information reconciliation based on a plurality of merchant values in accordance with an embodiment of the present invention;

FIG. 3 is a flow diagram of an item information reconciliation routine implemented by a service provider in accordance with an embodiment of the present invention; and

FIG. 4 is a flow diagram of a merchant item information quality determination routine implemented by a service provider in accordance with an embodiment of the present invention.

DETAILED DESCRIPTION

Generally described, the present invention relates to the management of descriptions of items from a plurality of merchants. Specifically, in one aspect, the present invention relates to the selection and management of textual descriptions for specific items from an item catalog based, in part, on a rating of cumulative quality of the descriptions from each submitting merchant. In another aspect, the present invention relates to the determination of ratings for the cumulative quality of descriptions of submitting merchants based upon a comparison of item descriptions from a particular merchant relative to other submitting merchants. Although the present invention will be described with regard to illustrative examples, operating environment and component interaction, one skilled in the relevant art will appreciate that the disclosed embodiments should not be construed as limiting.

FIG. 1 provides an illustrative overview of a distributed computing environment 100 in which embodiments of the invention may be implemented. The environment 100 includes an electronic marketplace 102 that includes, but is not limited to, an item catalog database 104, a merchant server 106, and a merchant computing system 108. The electronic marketplace 102 allows consumers, via their client devices 118 a-118 c, to view and purchase items offered for sale or lease on the electronic marketplace. The product catalog database 104 is communicatively connected to a merchant server 106. The merchant server 106 provides an interface for each of the client devices 118 a-118 c. The item catalog database 104 includes information on all items offered for sale within the electronic marketplace 102. In an illustrative embodiment, each entry in the item catalog database 104 can correspond to a structured, or semi-structured, set of data describing various aspects of a particular item. For example, an item catalog database entry for a particular book for sale may include information regarding author, title, publisher, keywords, executive summaries, and the like.

With continued reference to FIG. 1, the merchant computing system 108 can include, but is not limited to, a merchant value analyzer component 110 and a merchant quality analyzer component 112. In an illustrative embodiment, the merchant value analyzer component 112 obtains a set of merchant submission, e.g., merchant values, from each merchant selling an item and determines which merchant values adequately describe the item. The selection of a particular merchant value can be based on a calculated value of merchant quality descriptions (e.g., a merchant quality score). In another illustrative embodiment, the merchant quality analyzer component 110 determines a merchant quality score for each merchant based on the quality of information that they provide.

The computing environment 100 includes third-party merchants 114 a and 114 b. Third-party merchants 114 a and 114 b offer items for sale within the electronic marketplace 102. The computing environment 100 also includes client devices 118 a, 118 b, and 118 c. Consumers use the client devices 118 a, 118 b, and 118 c to access products offered for sale within the electronic marketplace 112.

The electronic marketplace 112, third-party merchants 114 a and 114 b, and the client devices 118 a, 118 b and 118 c are all communicatively connected to the network 116. The network 116 may be a Local Area Network (“LAN”), a larger network such as a Wide Area Network (“WAN”), or a collection of networks such as the Internet. Because protocols for network communication such as TCP/IP are well known to those skilled in the art of computer networks, further description of these protocols and other methods used to implement the network 116 are not provided herein.

FIG. 2 is a block diagram depicting a process in which a merchant value is selected to describe a product being offered for sale by multiple merchants. In an illustrative embodiment, the merchant value can correspond to a textual description for an aspect of identified items, such as a technical specification for an electronic item. In another embodiment, the merchant value can correspond to a set of textual descriptions for multiple aspects of identified items, such as all or a substantial portion of textual descriptions for the electronic item. With reference to FIG. 2, a product 202 corresponding to an item in the item catalog 104 is offered for sale by Merchant 1 204 a, Merchant 2 204 b, and Merchant 3 204 c. In an illustrative embodiment, each merchant submits a merchant value that identifies aspects of the product 202. In an illustrative embodiment, each Merchant 204 a-204 c submits a merchant value that corresponds to textual description for various aspects of the identified item. The merchant value can also include no data, or empty sets, for any aspect in which a merchant has no textual description. For example, a merchant value for a publication could include a submission of textual descriptions by a merchant of title, author, publisher, IBSN identifier, keywords, summary, table of contents, and the like. One skilled in the art will recognize that there are a number of characteristics that can be used as merchant value. Thus, the characteristics listed here should not be construed as limiting.

With continued reference to FIG. 2, Merchant 1 204 a submits merchant value #1 206 a. Merchant 2 204 b submits merchant value #2 206 b. Merchant 3 204 c submits merchant value #3 206 c. The merchant value analyzer component 110 receives merchant values 206 a, 206 b, and 206 c from the merchants and determines first that all three merchant values 206 a-206 c correspond to the same item in the item the catalog, referred to as item matching. An example of a system for item matching of merchant values is described in co-pending, commonly assigned U.S. application Ser. No. 11/754,237, entitled DUPLICATE ENTRY DETECTION SYSTEM AND METHOD, and filed on May 25, 2007. U.S. application Ser. No. 11/754,237 is incorporated by reference herein.

Upon matching the submitted merchant values, the merchant value analyzer component 110 determines from the set of merchant values the textual descriptions that best describe the product 202. In an illustrative embodiment, the merchant value analyzer component determines the textual descriptions based on a comparison of all the textual descriptions factored by a cumulative quality rating for each of the submitting merchants. In the diagram 200, a set of textual descriptions 206 are determined to best identify the product 202. A process for selecting specific textual descriptions will be discussed in more detail in conjunction with FIG. 3.

For example, assume that the target product corresponds to a Major League Baseball cap emblazoned with a particular team logo and that item catalog database 108 includes an entry for each separately identifiable cap. Merchant 1 (204 a) may submit a merchant value including a product textual description as “Major League Baseball memorabilia.” Merchant 2 (204 b) may submit a merchant value including a product description as a “baseball cap.” Further, Merchant 3 (204 c) may submit a merchant value including a product description as a “Mariners Major League Baseball Cap.” While each of these merchant values includes textual descriptions that have information identifying the product, in an illustrative embodiment, the merchant value analyzer component 112 identifies which one of the textual descriptions from the submitted merchant values 206 a-206 c best describes the product. This identification is based on a comparison of the three submissions, “Major League Baseball memorabilia,” “baseball cap,” and “Mariners Major League Baseball Cap”. Additionally, the merchant value analyzer component 112 may further process the submissions by weighing a previously compiled rating of cumulative quality for each submitting merchant, Merchants 206 a-206 c. In this example, the merchant value analyzer component 112 selects that the textual description that best identifies the product was submitted by Merchant 3, “Mariners Major League Baseball Cap.” The merchant value analyzer component 112 then repeats the selection routine for each potential aspect of an identified item to generate a cumulative set of textual descriptions 206.

With reference now to FIG. 3, a flow diagram illustrative of a sample merchant value routine 300 used by the merchant value analyzer 210 to determine a set of textual descriptions from the submitted, reconciled merchant values for an identified product will be described. As previously mentioned, a submitted merchant value can include textual descriptions for different aspects of a product. With reference to FIG. 3, at block 302, the merchant value analyzer component 110 obtains a set of products in which textual descriptions will be selected. In an illustrative example, the merchant value analyzer component 110 may receive a set of merchant values from any number of submitting merchants. Based on the merchant values, the merchant value analyzer component 110 may select textual descriptions for the identified products or update existing textual descriptions based on the newly submitted merchant values.

At block 304, a first product is identified. At block 306, a merchant value for the individual product identified in block 304 is determined. In one embodiment, the merchant value corresponds to the submission of textual description for a set of available attributes/aspects of an identified item. For example, a merchant value for an electronic item may include textual descriptions for title/identifier, manufacturer, dimensions, color, weight, reviews, technical support, etc. At block 308, a determination of whether this is the last merchant selling the individual product is determined. If it is determined at block 308 that the merchant is not the last merchant selling the product, the routine 300 returns to block 306. As previously described, the merchant value analyzer component 110 can implement a matching routine in the determination of merchant values that correspond to the same, or substantially the same, item.

If it is determined at block 308 that this is the last merchant selling the product, the merchant value analyzer component 110 has compiled the set of all merchant value submissions and the routine 300 proceeds to block 310. At block 310, the merchant value analyzer component 110 determines a data quality score for each textual description with a merchant value based on the similarity of the textual description for the target merchant value compared to textual descriptions for the other merchant values in the compiled set of all merchant value submissions. For example, if the set of submitted merchant values corresponds to merchant value submissions from four merchants, the merchant value analyzer component 110 compares a corresponding textual description from a target merchant value to the textual descriptions from the other three submitted merchant values. A resulting cumulative data quality score for the particular textual description would be calculated as the average of the individual data quality scores from each comparison. The merchant value analyzer component 110 could then repeat the process for the other submitted merchant values.

In one embodiment, the data quality score can correspond to a numerical value selected from a range of values. For example, a data quality score can be selected from a range of 1 to 0, in which a value of “1” represents identical values and a value of “0” represents no similarity in value. In an illustrative embodiment, the merchant value analyzer component 110 can utilize any one of a variety of string comparison algorithms to generate data quality scores.

At block 312, the cumulative data quality score for each textual description can be further weighted by the rating of cumulative quality for each submitting merchant. The merchant quality score is a score determined for each merchant based on the overall quality of information provided by a merchant. As stated earlier, an electronic marketplace can contain varying numbers of third-party merchants. Generally, a merchant value that is common amongst a majority of the merchants within the electronic marketplace has a higher probability of being correct. However, there could also be cases where a majority of the merchants have incorrectly or inadequately identified a product. Additionally, a smaller marketplace with only two or three merchants could be in danger of improperly identifying an item. In an illustrative embodiment, the rating of cumulative quality for each merchant can be represented as a numerical value that is multiplied with the data quality score for each textual description in a merchant value. The calculation of rating of cumulative quality for each merchant by the merchant quality analyzer component 112 will be discussed in more detail in conjunction with FIG. 4.

At block 314, the merchant value analyzer component 110 sorts the processed data quality scores and selects a textual description for each available attribute of the identified product using the data quality score. At decision block 318, a determination is made if this is the last product to be described. If it is determined that the current product is not the last product to be described, the routine 300 goes to block 320, where the next product is obtained. From there, the routine 300 moves back to block 306. If it is determined that the current product is the last product to be described, the routine 300 terminates at block 322.

FIG. 4 is a flow diagram of a sample merchant quality routine 400 which determines a rating of cumulative quality for each merchant in the electronic marketplace 102. At block 402, the merchant quality analyzer component 112 determines all merchants within the electronic marketplace 102 that have submitted merchant value submissions. Each merchant is given an initial merchant quality score at block 404. In one embodiment, all merchants are initially considered to provide quality information. For instance, each merchant can be initially awarded a merchant quality score of “1,” which is representative of a highest data quality score as described above. As the merchant continues to provide merchant values, there score will adjust according to the quality of the information they provide. In another embodiment, a new merchant may be provided an initial data quality score equal to an average of other merchant or allocated an initial data quality score corresponding to a merchant determined to be similar to the new merchant. As the merchant continues to provide merchant values, there score will adjust according to the quality of the information they provide similar to pre-existing merchants.

At block 406, the merchant quality analyzer component 112 identifies a next merchant. The data quality score of all the textual descriptions in the merchant values submitted by the merchant are obtained at block 408. For example, if the identified merchant has submitted 10 merchant values, the merchant quality analyzer component 112 will determine the data quality scores for each of the textual descriptions. These data quality scores allow the merchant quality analyzer component 112 to determine which merchants in the electronic marketplace submit quality information. A merchant who has good data quality scores submits good information concerning products. This will result in a better merchant quality score. Conversely, a merchant who has low data quality scores submits poor information concerning products and will have a lower merchant quality score.

At block 410, a rating of cumulative quality is determined based on the data quality scores of the merchant values obtained in block 408. For example, in an illustrative embodiment, a range of merchant quality scores may be set at “0” to “1”. The initial merchant quality score for the merchant may be set at “1”. If the merchant has submitted 3 merchant values with data quality scores with an average of “0.7”, the merchant quality of score of the merchant will be adjusted based on those data quality scores. For instance, the initial merchant quality score may be divided by the average data quality score to determine the final merchant quality score. One skilled in the art will recognize that there are numerous ways to mathematically calculate the merchant quality score. Thus, the example used is presented for purposes of example only and should not be construed as limiting.

In another embodiment, a subset of all a merchant's data quality scores may be used to compute the merchant quality score. These subsets can be based on a number of different parameters. For instance, if a user has submitted ten merchant values, only statistically consistent scores can be used to determine the merchant quality score. For example, the best scores and the worst score may be removed. Alternatively, any data scores deviating beyond a threshold amount from an average data score may be discarded. Still further, a “sliding window” of a pre-determined number of data value scores may be used. For example, only the 50 latest merchant values may be used if calculating the average merchant quality score. In still another embodiment, another subset can be created based on the dates the merchant values were submitted by the merchant. For example, a merchant within the electronic marketplace for several years could go a long period without submitting merchant values. While the earlier submissions by the merchant may have been high quality, the new submissions are not. Thus, by specifying a particular range of dates for merchant values and only using those in the determination of a merchant quality score, the electronic marketplace can identify trends in the quality of information provided by a merchant.

Returning to the routine 400, at decision block 412, a determination is made of whether the current merchant is the last merchant within the electronic marketplace. If the merchant is not determined to be the last merchant, the routine 400 returns to block 408. If the merchant is determined to be the last merchant, the routine 400 moves to block 414 where it terminates.

While illustrative embodiments have been illustrated and described, it will be appreciated that various changes can be made therein without departing from the spirit and scope of the invention. For example, in some embodiments, a merchant may be associated with multiple average data scores associated with various characteristics. For instance, a large merchant may offer a wide variety of products for sale from appliances to electronics. To ensure quality in the descriptions of those products, a merchant may have an average data score dependent on the category of the product. For instance, a merchant may be associated with an “appliances” average data score for merchant values related to appliances. Likewise, the merchant may have a separate “electronics” average data score for merchant values associated with electronics.

In another embodiment, merchant value submissions may be associated with the specific individuals descriptions for items. For example, a merchant could have two persons who are responsible for submitting descriptions of products. In this embodiment, the individual and merchant are associated with the average data score. Thus the average data score of another individual would not affect another individual submitting a merchant value on behalf of a merchant. 

The embodiments of the invention in which an exclusive property or privilege is claimed are defined as follows:
 1. A method, comprising: receiving via a network connection, by a value analyzer implemented on one or more computers of a distributed computing environment, a plurality of submissions from a plurality of entities, wherein the plurality of submissions each comprise a textual description of an attribute of a product; matching, by the value analyzer, a set of the plurality of submissions such that the set of matching submissions include textual descriptions of a same product; performing, by the value analyzer, a similarity comparison for each submission of the set of matching submissions, wherein performing the similarity comparison comprises: comparing a particular submission of the set of matching submissions to a plurality of other submissions of the set of matching submissions; and determining a plurality of data quality scores for the particular submission, wherein each data quality score of the plurality of data quality scores is based on a measure of similarity between the textual description of the attribute of the product included in the particular submission and a textual description of the attribute of the product included in a respective one of the other submissions of the set of matching submissions; determining, by the value analyzer, based on the plurality of data quality scores, an overall data quality score for the quality of the textual description of the attribute of the product included in the particular submission, wherein the particular submission is from a particular entity of the plurality of entities; rating, by a quality analyzer implemented on one or more computers of the distributed computing environment, the particular entity who submitted the particular submission, based, at least in part, on the overall data quality score of the textual description submitted from the particular entity, wherein said rating comprises: combining the overall data quality score for the textual description in the particular submission with one or more previously determined data quality scores of textual descriptions in submissions previously received from the same particular entity to determine a cumulative data quality score of textual descriptions in submissions received from the same particular entity, wherein the particular entity is distinct from the other entities of the plurality of entities; and assigning a rating to the particular entity, wherein the rating indicates the particular entity's propensity to provide quality product descriptions; selecting, by the value analyzer implemented on the one or more computers of the distributed computing environment, a textual description of the attribute of the product, to be provided for display from the plurality of textual descriptions of the attribute of the product included in the plurality of submissions based, at least in part, on the rating of the particular entity that submitted the particular submission; storing the selected textual description of the attribute of the product in a database of the distributed computing environment; and in response to a query from a client, transmitting by the server of the distributed computing environment via a network-based interface to the client, the selected textual description of the attribute of the product for display.
 2. The method of claim 1, wherein said determining the overall data quality score for the quality of the textual description of the attribute of the product included in the particular submission comprises: averaging the plurality of data quality scores to determine the overall data quality score for the quality of the textual description.
 3. The method of claim 1, further comprising: receiving, by the value analyzer, a second submission comprising another textual description of another attribute of a product from the particular entity; determining, by the value analyzer, a data quality score for the other textual description based, at least in part, on one or more other textual descriptions of the other attribute of the product submitted by other entities of the plurality of entities; and adjusting, by the quality analyzer, the particular entity's rating based, at least in part, on the data quality score for the second textual description.
 4. The method of claim 1, wherein said combining the overall data quality score for the textual description with the one or more previously determined data quality scores of textual descriptions in submissions previously received from the particular entity comprises excluding statistically inconsistent scores.
 5. The method of claim 1, wherein the one or more previously determined data quality scores of textual descriptions are selected based on a number of most recent submissions received from the particular entity.
 6. The method of claim 1, wherein the one or more previously determined data quality scores of textual descriptions are selected based on a date range for submissions received from the particular entity.
 7. The method of claim 1, wherein the one or more previously determined data quality scores of textual descriptions correspond to a selected product category.
 8. A system, comprising: a distributed computing environment, comprising: a server comprising one or more hardware processors and memory, and configured to provide a network-based interface for viewing product descriptions; and one or more computing devices comprising one or more hardware processors and memory, and configured to implement a value analyzer and a quality analyzer, wherein the value analyzer is configured to: receive via a network connection to the one or more computers implementing the value analyzer, a plurality of submissions from a plurality of entities, wherein the plurality of submissions each comprise a textual description of an attribute of a product; match a set of the plurality of submissions such that the submission in the set of matched submissions each include textual descriptions of a same product; perform a similarity comparison for each submission of the set of matched submissions, wherein to perform the similarity comparison the value analyzer is configured to: compare a particular submission of the set of matched submissions to a plurality of other submissions of the set of matched submissions; and determine a plurality of data quality scores for the particular submission, wherein each data quality score of the plurality of data quality scores is based on a measure of similarity between the textual description of the attribute of the product included in the particular submission and a textual description of the attribute of the product included in a respective one of the other submissions of the set of matched submissions; calculate, based on the plurality of data quality scores, a calculated data quality score for the quality of the textual description of the attribute of the product included in the submission, wherein the particular submission is from a particular entity of the plurality of entities; wherein the quality analyzer is configured to: rate the particular entity based, at least in part, on the calculated data quality score of the textual description submitted from the particular entity, wherein to rate the particular entity t, the quality analyzer is configured to: combine the calculated data quality score with one or more previously calculated data quality scores of textual descriptions in submissions previously received from the same particular entity to determine a cumulative data quality score of textual descriptions in submissions received from the same particular entity, wherein the particular entity is distinct from the other entities of the plurality of entities; and assign a rating to the particular entity, wherein the rating indicates the particular entity's propensity to provide quality product descriptions.
 9. The system of claim 8, wherein the quality analyzer is further configured to adjust the particular entity's rating based, at least in part, on a second data quality score calculated for a second textual description received in another submission from the particular entity.
 10. The system of claim 8, wherein the one or more previously determined data quality scores of textual descriptions are selected based on a number of most recent submissions received from the particular entity.
 11. The system of claim 8, wherein the one or more previously determined data quality scores of textual descriptions are selected based on a date range for submissions received from the particular entity.
 12. The system of claim 8, wherein the one or more previously determined quality scores of textual descriptions correspond to a selected product category.
 13. A non-transitory computer-readable storage device storing instructions that, in response to execution by a computer system, cause the computer system to perform operations comprising: receiving, via a network connection, a plurality of submissions from a plurality of entities, wherein the plurality of submissions each comprise a textual description of an attribute of a product; matching a set of the plurality of submissions such that the set of matching submissions include textual descriptions of a same product; performing a similarity comparison for each submission of the set of matching submissions, wherein performing the similarity comparison comprises: comparing a particular submission of the set of matching submissions to a plurality of other submissions of the set of matching submissions; and determining a plurality of data quality scores for the particular submission, wherein each data quality score of the plurality of data quality scores is based on a measure of similarity between the textual description of the attribute of the product included in the particular submission and a textual description of the attribute of the product included in a respective one of the other submissions of the set of matching submissions; calculating, based on the plurality of data quality scores a calculated data quality score for the quality of the textual description of the attribute of the product included in the submission from the particular entity; rating the particular entity based, at least in part, on the calculated data quality score of the textual description submitted from the particular entity, wherein said rating comprises: combining the calculated data quality score with one or more previously calculated data quality scores of textual descriptions in submissions previously received from the same particular entity to determine a cumulative data quality score of textual descriptions in submissions received from the same particular entity, wherein the particular entity is distinct from other entities of the plurality of entities; and assigning a rating to the particular entity, wherein the rating indicates the particular entity's propensity to provide quality product descriptions; selecting a textual description of the attribute of the product, to be displayed via a network-based interface implemented via a server in a distributed computing environment that includes the computer system that performs the operations, from the plurality of textual descriptions of the attribute of the product included in the plurality of submissions based, at least in part, on the rating of the particular entity that submitted the particular submission.
 14. The non-transitory computer-readable storage device of claim 13, wherein the one or more previously determined data quality scores of textual descriptions are selected based on a number of most recent submissions received from the particular entity.
 15. The non-transitory computer-readable storage device of claim 13, wherein the one or more previously determined data quality scores of textual descriptions are selected based on a date range for submissions received from the particular entity.
 16. The non-transitory computer-readable storage device of claim 13, wherein the one or more previously determined data quality scores of textual descriptions correspond to a selected product category. 